78 research outputs found

    Pertinence des trois premiers formants des voyelles orales dans la caractérisation du locuteur

    Get PDF
    International audienceLe travail rapporté dans cet article se situe dans le cadre de la caractérisation automatique du locuteur. Dans cette optique, nous sommes en train de réaliser une étude sur la pertinence de plusieurs paramètres phonétiques et acoustiques dont le premier volet est constitué par l’étude des trois premiers formants de certaines voyelles orales du français. Il s’agit de déterminer les voyelles orales les mieux adaptées à la reconnaissance automatique du locuteur et, pour chacune d’elles de déterminer les formants ou les combinaisons de formants les plus discriminants. Après avoir décrit l’élaboration et l’étiquetage du corpus dont sont issues ces voyelles, nous développerons la méthode de détermination des trois premiers formants des voyelles. Puis, nous présenterons les indicateurs de pertinence utilisés pour classer les voyelles et les combinaisons formantiques avant de terminer par la présentation de quelques résultats

    Extraction of formants of oral vowels and critical analysis for speaker characterization

    Get PDF
    International audienceMethods for achieving automatic speaker recognition may be classified into twocategories : pattern recognition based approaches that implicitly use interspeaker and intraspeakervariability of speech and approaches which explicitly take into account thesources of interspeaker and intraspeaker differences. The latter examine linguistic unitsin order to extract features which are relevant for speaker characterization. The aim of thepresent paper is precisely to study the relative effectiveness of the first three formants ofdifferents French vowels for speaker characterization.As a part of a larger set of preselected acoustic and phonetic parameters, the sevenFrench vowels : / i /, / e /, / E /, / 0 /, / a /, / O /, / u /, with a neutral bilabial previouscontext / p /, / b / and a lengthening subsequent context / R /, have been studied. Forthat purpose, we have recorded and digitalized a set of seventeen sentences, uttered fourtimes by ten male speakers coming from the same region. In order to isolate the trigrams/ p-vowel-R / and / b-vowel-R /, we have hand labeled the sentences according to strictrules. We have then established an automatic method to determine very reliable valuesof the three frequencies of the first formants of selected vowels. The retained frequencieswere then used to conduct a speaker identification experiment. Its aim was to identify anunknown speaker from a group of ten known speakers by using his utterance of a givenvowel. To this end, a speaker was represented by a vector of one, two or three formantfrequencies or by a vector of one, two or three differences between two formant frequencies.For each vowel and for each type of vector, i.e. each combination of formant frequencies,three "relevance indicators" have been computed, i.e. the global speaker identification rate,the sum of recognition ranks of every speaker and the ratio of intraspeaker to interspeakerinertia. These indicators have been established for five kinds of weighting distance amongwhich a perceptual one.In the first part of this paper, we present our methodology to evaluate the formantfrequencies of the vowels and we discuss the reliability of the results.In the second one, we examine the relative effectiveness of every vowel for eachcombination of formant frequencies by focusing on an interpretation of the results withrespect to speech production process. We also compare our results with those obtainedin normalization studies, in particular with the non-uniform female/male formant frequencyratios (ki)

    Detection of Phone Boundaries for Non-Native Speech using French-German Models

    Get PDF
    International audienceWithin the framework of computer assisted foreign language learning for the French/German pair, we evaluate different HMM phone models for detecting accurate phone boundaries. The optimal parameters are determined by minimizing on the non-native speech corpus the number of phones whose boundaries are shifted by more than 20 ms compared to the manual boundaries. We observe that the best performance was obtained by combining a French native HMM model with an automatically selected German native HMM model

    Semi-automatic phonetic labelling of large corpora

    Get PDF
    International audienceThe aim of the present paper is to present a methodology to semi-automatically label large corpora. This methodology is based on three main points: using several concurrent automatic stochastic labellers, decomposing the labelling of the whole corpus into an iterative refining process and building a labelling comparison procedure which takes into account phonologic and acoustic-phonetic rules to evaluate the similarity of the various labelling of one sentence. After having detailed these three points, we describe our HMM-based labelling tool and we describe the application of that methodology to the Swiss French POLYPHON database

    Two Tools for Semi-automatic Phonetic Labelling of Large Corpora

    Get PDF
    International audienceThis paper presents two tools allowing a reliable semi-automatic labelling of large corpora : an automatic HMM-based labelling tool and an assessment and decision system to validate the automatically labelled sentences. This decision system uses the results supplied by another automatic labeller and compares their results with a parametrisable comparison process. We also propose an generic methodology to improve the labelling accuracy and to reduce the step of manual verification

    JTrans, an open-source software for semi-automatic text-to-speech alignment

    Get PDF
    International audienceAligning speech corpora with text transcriptions is an important requirement of many speech processing, data mining applications and linguistic researches. Despite recent progress in the field of speech recognition, many linguists still manually align spontaneous and noisy speech recordings to guarantee a good alignment quality. This work proposes an open-source java software with an easy-to-use GUI that integrates dedicated semiautomatic speech alignment algorithms that can be dynamically controlled and guided by the user. The objective of this software is to facilitate and speed up the process of creating and aligning speech corpora

    New Paradigm in Speech Recognition: Deep Neural Networks

    Get PDF
    International audienceThis paper addresses the topic of deep neural networks (DNN). Recently, DNN has become a flagship in the fields of artificial intelligence. Deep learning has surpassed state-of-the-art results in many domains: image recognition, speech recognition, language modelling, parsing, information retrieval, speech synthesis, translation, autonomous cars, gaming, etc. DNN have the ability to discover and learn complex structure of very large data sets. Moreover, DNN have a great capability of generalization. More specifically, speech recognition with DNN is the topic of our work in this paper. We present an overview of different architectures and training procedures for DNN-based models. In the framework of transcription of broadcast news, our DNN-based system decreases the word error rate dramatically compared to a classical system

    De l'importance de l'homogénéisation des conventions de transcription pour l'alignement automatique de corpus oraux de parole spontanée

    Get PDF
    International audienceAfin de pouvoir offrir à la communauté scientifique un Corpus d’Étude pour le Français Contemporain écrit et oral (CEFC), le projet ANR ORFEO (Outils et Recherches sur le Français Ecrit et Oral) a décidé de rassembler sur une plate-forme plusieurs corpus oraux existants en associant à chacun un ensemble de couches d’annotation. La couche d’annotation la plus proche du signal audio est le résultat de l’alignement automatique en phonèmes et en mots d’un fichier audio à partir de la transcription orthographique associée à ce fichier audio.Les corpus rassemblés dans le projet ont été orthographiquement transcrits par différents laboratoires en utilisant des conventions propres à chaque laboratoire et donc hétérogènes. Au LORIA, nous avons développé le logiciel ASTALI (Automatic Speech-Text ALIgnment) pour réaliser automatiquement l’alignement en phonèmes et en mots de corpus oraux. L’objet de cet article est de présenter les difficultés rencontrées lors de l’adaptation de notre outil pour l’alignement des différents corpus ORFEO du fait de l’hétérogénéité des conventions de transcription

    Inter-annotator agreement for a speech corpus pronounced by French and German language learners

    Get PDF
    International audienceThis paper presents the results of an investigation of inter-annotator agreement for the non-native and native French part of the IFCASL corpus. This large bilingual speech corpus for French and German language learners was manually annotated by several annotators. This manual annotation is the starting point which will be used both to improve the automatic segmentation algorithms and derive diagnosis and feedback. The agreement is evaluated by comparing the manual alignments of seven annotators to the manual alignment of an expert, for 18 sentences. Whereas results for the presence of the devoicing diacritic show a certain degree of disagreement between the annotators and the expert, there is a very good consistency between annotators and the expert for temporal boundaries as well as insertions and deletions. We find a good overall agreement for boundaries between annotators and expert with a mean deviation of 7.6 ms and 93% of boundaries within 20 ms

    An enhanced automatic speech recognition system for Arabic

    Get PDF
    International audienceAutomatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition , it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right hamoza above or below Alif
    • …
    corecore